第 6 章：分配 Pods 在對應 Nodes

決定 Pod 建立在哪個 Node 的過程

透過 api server 取得 node 信息
node 選擇：filter > Scoring > Bind
更新 pod 的信息，包括在哪個 node 上
被選節點的 kubelet 透過監控 api-server 得知自己被選擇創造一個 pod
kubelet 驅動 container runtime 建立 container 並啟動

Node Selector

pods to Nodes using Labels and Selectors
- 給節點設定 label Apply Labels to Nodes
- Scheduler will assign Pods to a Node with a matchling Label

apiVersion: v1
kind: Pod
metadata:
  name: web
spec:
  containers:
  - name: hello-world
    image: nginx
  nodeSelector:
    hardware: local_gpu

為 Node 加上 Label

kubectl label nodes k8s-worker1 hardware=local_gpu

Affinity 親和性

Node Affinity

requiredDuringSchedulingIgnoredDuringExecution

如果執行的 pod 所在節點不再符合條件，kubernetes 會將 pod 從節點中刪除，重新選擇符合要求的節點

preferredDuringSchedulingIgnoredDuringExecution

pod 部署之後運行的時候，如果節點標籤發生了變化，不再滿足 pod 指定的條件，pod 也會繼續運作

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution: 

Pod Affinity

使用情境：我們系統服務 A 和服務 B 盡量部署在同個主機、機房、城市，因為它們網絡溝通比較多；再比如我們系統數據服務 C 和數據服務 D 盡量分開，因為如果它們分配在一起，然後主機或機房出了問題，會導致應用程式完全不可用，如果它們是分開的，應用程式雖然有影響，但還是可用的。
requiredDuringSchedulingIgnoredDuringExecution
preferredDuringSchedulingIgnoredDuringExecution

spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution: 

Taints

Node affinity 使得 pod 有选择 node 的能力. Taints 正好相反

kubectl taint nodes g8node1 size=large:NoSchedule

effect 欄位：當Pod因為此Taint而無法調度到該節點上的時候，該怎麼處理

NoSchedule
- 若在剩餘Taints中存在effect=NoSchedule，則調度器不會把該Pod調度到此節點上
PreferNoSchedule
- 若在剩餘Taints中沒有effect=NoSchedule，但有PreferNoSchedule，則會試著不把該Pod調度到此節點
NoExcute
- 若在剩餘Taints中有effect=NoExcute，並且該Pod已經在此節點上運行，則會被驅逐；若沒有在此節點上運行，也不會再被調度到此節點上

在特定情境下，內建的 taints 會運行

node.kubernetes.io/not-ready: Node is not ready. This corresponds to the NodeCondition Ready being "False".
node.kubernetes.io/unreachable: Node is unreachable from the node controller. This corresponds to the NodeCondition Ready being "Unknown".
node.kubernetes.io/memory-pressure: Node has memory pressure.
node.kubernetes.io/disk-pressure: Node has disk pressure.
node.kubernetes.io/pid-pressure: Node has PID pressure.
node.kubernetes.io/network-unavailable: Node's network is unavailable.
node.kubernetes.io/unschedulable: Node is unschedulable.
node.cloudprovider.kubernetes.io/uninitialized: When the kubelet is started with "external" cloud provider, this taint is set on a node to mark it as unusable. After a controller from the cloud-controller-manager initializes this node, the kubelet removes this taint.

Tolerations

Tolerations are applied to pods. Tolerations allow the scheduler to schedule pods with matching taints

Pod的Toleration宣告中的key和effect需要與Taint的設定保持一致，並滿足以下條件之一：

operator的值是Exists ( 無須指定 value )
operator的值是Equal且value值相等

Cordoning

Cordoning 是把一個節點標記為 unschedulabel，一旦標記後，就不會再有新的 pod 被部署到這個節點上了。但已經運行在這個節點的 pod 不受影響。

Drain
- drain 可以 gracefully 的停止一個節點上的 Pod

kubectl cordon $NODE

Uncordon
- 重新標記一個節點為 schedulable

kubectl drain $NODE --ingore-daemonsets

Manual Scheduling

就是直接手動決定 pod 運行在哪個 Node

taint 的 node 是否可以接受這種 pod？ yes
cordon 的 node 是否可以接受這種 pod？ yes

apiVersion: v1
kind: Pod
metadata:
  name: web
spec:
   nodeName: 'k8s-worker1'
   containers:
    - name: nginx-container
      image: nginx:latest

第 6 章：分配 Pods 在對應 Nodes

決定 Pod 建立在哪個 Node 的過程​

Node Selector​

Affinity 親和性​

Node Affinity​

Pod Affinity​

Taints​

Tolerations​

Cordoning​